Automatic Wikibook Prototyping via Mining Wikipedia

نویسندگان

  • Jen-Liang Chou
  • Shih-Hung Wu
چکیده

Wikipedia is the world’s largest collaboratively edited source of encyclopedic knowledge. Wikibook is a sub-project of Wikipedia that is intended to create a book that can be edited by various contributors, similar to how Wikipedia is composed and edited. Editing a book, however, requires more effort than editing separate articles. Therefore, methods of quickly prototyping a book is a new research issue. In this paper, we investigate how to automatically extract content from Wikipedia and generate a prototype of a Wikibook as a start point for further editing. Applying search technology, our system can retrieve relevant articles from Wikipedia. A table of contents is built automatically and is based on a two-stage searching method. Our experiments show that, given a keyword as the title of a book, our system can generate a table of contents, which can be treated as a prototype of a Wikibook. Such a system can help free textbook editing. We propose an evaluation method based on the comparison of system results to a traditional textbook and show the coverage of our system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Wikibook Prototyping

Wikipedia is the world’s largest collaboratively edited source of encyclopedic knowledge. Wikibook is a sub-project of Wikipedia. The purpose of Wikibook is to enable a free textbook to be edited by various contributors, in the same way that Wikipedia is composed and edited. However, editing a book requires more effort than editing separate articles. Therefore, how to help users cooperatively e...

متن کامل

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Automatic Document Topic Identification Using Hierarchical Ontology Extracted from Human Background Knowledge

The rapid growth in the number of documents available to end users from around the world has led to a greatly-increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure ...

متن کامل

Mining Relations between Wikipedia Categories

The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created ca...

متن کامل

Taxonomic Relation Extraction from Wikipedia: Datasets and Algorithms

The dynamic and continuously growing category structure of Wikipedia has been used in numerous ontology extraction methods. We present a dataset of category subgraphs automatically extracted from Wikipedia that are manually annotated for is-a and instance-of relations in order to enable a more comprehensive evaluation of taxonomy mining approaches. We also show how the new dataset can be used w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJCLCLP

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2008